24 research outputs found
SANA NetGO: A combinatorial approach to using Gene Ontology (GO) terms to score network alignments
Gene Ontology (GO) terms are frequently used to score alignments between
protein-protein interaction (PPI) networks. Methods exist to measure the GO
similarity between two proteins in isolation, but pairs of proteins in a
network alignment are not isolated: each pairing is implicitly dependent upon
every other pairing via the alignment itself. Current methods fail to take into
account the frequency of GO terms across the networks, and attempt to account
for common GO terms in an ad hoc fashion by imposing arbitrary rules on when to
"allow" GO terms based on their location in the GO hierarchy, rather than using
readily available frequency information in the PPI networks themselves. Here we
develop a new measure, NetGO, that naturally weighs infrequent, informative GO
terms more heavily than frequent, less informative GO terms, without requiring
arbitrary cutoffs. In particular, NetGO down-weights the score of frequent GO
terms according to their frequency in the networks being aligned. This is a
global measure applicable only to alignments, independent of pairwise GO
measures, in the same sense that the edge-based EC or S3 scores are global
measures of topological similarity independent of pairwise topological
similarities. We demonstrate the superiority of NetGO by creating alignments of
predetermined quality based on homologous pairs of nodes and show that NetGO
correlates with alignment quality much better than any existing GO-based
alignment measures. We also demonstrate that NetGO provides a measure of
taxonomic similarity between species, consistent with existing taxonomic
measures--a feature not shared with existing GO-based network alignment
measures. Finally, we re-score alignments produced by almost a dozen aligners
from a previous study and show that NetGO does a better job than existing
measures at separating good alignments from bad ones
Recommended from our members
New Applications of the Nearest-Neighbor Chain Algorithm
The nearest-neighbor chain algorithm was proposed in the eighties as a way to speed up certain hierarchical clustering algorithms. In the first part of the dissertation, we show that its application is not limited to clustering. We apply it to a variety of geometric and combinatorial problems. In each case, we show that the nearest-neighbor chain algorithm finds the same solution as a preexistent greedy algorithm, but often with an improved runtime. We obtain speedups over greedy algorithms for Euclidean TSP, Steiner TSP in planar graphs, straight skeletons, a geometric coverage problem, and three stable matching models. In the second part, we study the stable-matching Voronoi diagram, a type of plane partition which combines properties of stable matchings and Voronoi diagrams. We propose political redistricting as an application. We also show that it is impossible to compute this diagram in an algebraic model of computation, and give three algorithmic approaches to overcome this obstacle. One of them is based on the nearest-neighbor chain algorithm, linking the two parts together
Defining Equitable Geographic Districts in Road Networks via Stable Matching
We introduce a novel method for defining geographic districts in road
networks using stable matching. In this approach, each geographic district is
defined in terms of a center, which identifies a location of interest, such as
a post office or polling place, and all other network vertices must be labeled
with the center to which they are associated. We focus on defining geographic
districts that are equitable, in that every district has the same number of
vertices and the assignment is stable in terms of geographic distance. That is,
there is no unassigned vertex-center pair such that both would prefer each
other over their current assignments. We solve this problem using a version of
the classic stable matching problem, called symmetric stable matching, in which
the preferences of the elements in both sets obey a certain symmetry. In our
case, we study a graph-based version of stable matching in which nodes are
stably matched to a subset of nodes denoted as centers, prioritized by their
shortest-path distances, so that each center is apportioned a certain number of
nodes. We show that, for a planar graph or road network with nodes and
centers, the problem can be solved in time, which improves
upon the runtime of using the classic Gale-Shapley stable matching
algorithm when is large. Finally, we provide experimental results on road
networks for these algorithms and a heuristic algorithm that performs better
than the Gale-Shapley algorithm for any range of values of .Comment: 9 pages, 4 figures, to appear in 25th ACM SIGSPATIAL International
Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL
2017) November 7-10, 2017, Redondo Beach, California, US
Stable-Matching Voronoi Diagrams: Combinatorial Complexity and Algorithms
We study algorithms and combinatorial complexity bounds for stable-matching Voronoi diagrams, where a set, S, of n point sites in the plane determines a stable matching between the points in R^2 and the sites in S such that (i) the points prefer sites closer to them and sites prefer points closer to them, and (ii) each site has a quota indicating the area of the set of points that can be matched to it. Thus, a stable-matching Voronoi diagram is a solution to the classic post office problem with the added (realistic) constraint that each post office has a limit on the size of its jurisdiction. Previous work provided existence and uniqueness proofs, but did not analyze its combinatorial or algorithmic complexity. We show that a stable-matching Voronoi diagram of n sites has O(n^{2+epsilon}) faces and edges, for any epsilon>0, and show that this bound is almost tight by giving a family of diagrams with Theta(n^2) faces and edges. We also provide a discrete algorithm for constructing it in O(n^3+n^2f(n)) time, where f(n) is the runtime of a geometric primitive that can be performed in the real-RAM model or can be approximated numerically. This is necessary, as the diagram cannot be computed exactly in an algebraic model of computation
SANA: simulated annealing far outperforms many other search algorithms for biological network alignment
SummaryEvery alignment algorithm consists of two orthogonal components: an objective function M measuring the quality of an alignment, and a search algorithm that explores the space of alignments looking for ones scoring well according to M . We introduce a new search algorithm called SANA (Simulated Annealing Network Aligner) and apply it to protein-protein interaction networks using S 3 as the topological measure. Compared against 12 recent algorithms, SANA produces 5-10 times as many correct node pairings as the others when the correct answer is known. We expose an anti-correlation in many existing aligners between their ability to produce good topological vs. functional similarity scores, whereas SANA usually outscores other methods in both measures. If given the perfect objective function encoding the identity mapping, SANA quickly converges to the perfect solution while many other algorithms falter. We observe that when aligning networks with a known mapping and optimizing only S 3 , SANA creates alignments that are not perfect and yet whose S 3 scores match that of the perfect alignment. We call this phenomenon saturation of the topological score . Saturation implies that a measure's correlation with alignment correctness falters before the perfect alignment is reached. This, combined with SANA's ability to produce the perfect alignment if given the perfect objective function, suggests that better objective functions may lead to dramatically better alignments. We conclude that future work should focus on finding better objective functions, and offer SANA as the search algorithm of choice.Availability and implementationSoftware available at http://sana.ics.uci.edu [email protected] informationSupplementary data are available at Bioinformatics online
Automatic evaluation of top-down predictive parsing
We develop efficient methods to check whether two given Context-Free Grammars (CFGs) are transformed into parsers that recognize the same language and construct the same Abstract Syntax Trees (ASTs) for each input. In this setting, we consider a model of top-down predictive parser generator with directives for AST construction that is a simplified variant of PCCTS/ANTLR3. As an application, we implement an
evaluator for an online judge with educational purposes in the context of a Compilers course.Preprin